10 research outputs found

    Reconnaissance de l’écriture manuscrite avec des réseaux récurrents

    Get PDF
    Mass digitization of paper documents requires highly efficient optical cha-racter recognition systems. Digital versions of paper documents enable the useof search engines through keyword dectection or the extraction of high levelinformation (e.g. : titles, author, dates). Unfortunately writing recognition sys-tems and especially handwriting recognition systems are still far from havingsimilar performance to that of a human being on the most difficult documents.This industrial PhD (CIFRE) between Airbus DS and the LITIS, that tookplace within the MAURDOR project time frame, aims to seek out and improvethe state of the art systems for handwriting recognition.We compare different systems for handwriting recognition. Our compa-risons include various feature sets as well as various dynamic classifiers : i)Hidden Markov Models, ii) hybrid neural network/HMM, iii) hybrid recurrentnetwork Bidirectional Long Short Term Memory - Connectionist TemporalClassification (BLSTM-CTC)/MMC, iv) a hybrid Conditional Random Fields(CRF)/HMM. We compared these results within the framework of the WR2task of the ICDAR 2009 competition, namely a word recognition task usinga 1600 word lexicon. Our results rank the BLSTM-CTC/HMM system as themost performant, as well as clearly showing that BLSTM-CTCs trained ondifferent features are complementary.Our second contribution aims at using this complementary. We explorevarious combination strategies that take place at different levels of the BLSTM-CTC architecture : low level (early fusion), mid level (within the network),high level (late integration). Here again we measure the performances of theWR2 task of the ICDAR 2009 competition. Overall our results show thatour different combination strategies improve on the single feature systems,moreover our best combination results are close to that of the state of theart system on the same task. On top of that we have observed that some ofour combinations are more adapted for systems using a lexicon to correct amistake, while other are better suited for systems with no lexicon.Our third contribution is focused on tasks related to handwriting recognition. We present two systems, one designed for language recognition, theother one for keyword detection, either from a text query or an image query.For these two tasks our systems stand out from the literature since they usea handwriting recognition step. Indeed most literature systems focus on extracting image features for classification or comparison, wich does not seemappropriate given the tasks. Our systems use a handwriting recognition stepfollowed either by a language detection step or a word detection step, depending on the application.La numérisation massive de documents papier a fait apparaître le besoin d’avoir des systèmes de reconnaissance de l’écriture extrêmement performants. La numérisation de ces documents permet d’effectuer des opérations telles que des recherches de mots clefs ou l’extraction d’informations de haut niveau (titre, auteur, adresses, et.). Cependant la reconnaissance de l’écriture et en particulier l’écriture manuscrite ne sont pas encore au niveau de performance de l’homme sur des documents complexes, ce qui restreint ou nuit à certaines applications. Cette thèse CIFRE entre Airbus DS et le LITIS, dans le cadre du projet MAURDOR, a pour but de mettre en avant et d’améliorer les méthodes état de l’art dans le domaine de la reconnaissance de l’écriture manuscrite. Nos travaux comparent différents systèmes permettant d’effectuer la reconnaissance de l’écriture manuscrite. Nous comparons en particulier différentes caractéristiques et différents classifieurs dynamiques : i) Modèles de Markov Cachés (MMC), ii) hybride réseaux de neurones/MMC, iii) hybride réseaux récurrents « Bidirectional Long Short Term Memory - Connectionist Temporal Classification » (BLSTM-CTC)/MMC et iv) hybride Champs Aléatoires Conditionnels (CAC)/MMC. Les comparaisons sont réalisées dans les conditions de la tâche WR2 de la compétition ICDAR 2009, c’est à dire une tâche de reconnaissance de mots isolés avec un dictionnaire de 1600 mots. Nous montrons la supériorité de l’hybride BLSTM-CTC/MMC sur les autres classifieurs dynamiques ainsi que la complémentarité des sorties des BLSTM-CTC utilisant différentes caractéristiques.Notre seconde contribution vise à exploiter ces complémentarités. Nous explorons des stratégies de combinaisons opérant à différents niveaux de la structure des BLSTM-CTC : bas niveau (en entrée), moyen niveau (dans le réseau), haut niveau (en sortie). Nous nous plaçons de nouveau dans les conditions de la tâche WR2 de la compétition ICDAR 2009. De manière générale nos combinaisons améliorent les résultats par rapport aux systèmes individuels, et nous avoisinons les performances du meilleur système de la compétition. Nous avons observé que certaines combinaisons sont adaptées à des systèmes sans lexique tandis que d’autres sont plus appropriées pour des systèmes avec lexique. Notre troisième contribution se situe sur deux applications liées à la reconnaissance de l’écriture. Nous présentons un système de reconnaissance de la langue ainsi qu’un système de détection de mots clefs, à partir de requêtes images et de requêtes de texte. Dans ces deux applications nous présentons une approche originale faisant appel à la reconnaissance de l’écriture. En effet la plupart des systèmes de la littérature extraient des caractéristiques des image pour déterminer une langue ou trouver des images similaires, ce qui n’est pas nécessairement l’approche la plus adaptée au problème à traiter. Nos approches se basent sur une phase de reconnaissance de l’écriture puis une analyse du texte afin de déterminer la langue ou de détecter un mot clef recherché

    Benchmarking discriminative approaches for word spotting in handwritten documents

    No full text
    International audienceIn this article, we propose to benchmark the most popular methods for word spotting in handwritten documents. The benchmark includes a pure HMM approach, as well as hybrid discriminative methods MLP-HMM, CRF-HMM, RNN-HMM and BLSTM-CTC-HMM. This study enables us to observe the increase ratio of performance provided by each discriminative stage compared with the pure generative HMM approach. Moreover, we put forward the different abilities of all these discriminative stages from the simplest MLP to the most complex and current state of the art BLSTM-CTC. We also propose a more specific and original study on BLSTM-CTC, showing that when used as a lexicon-free recognizer, it can reach very interesting word-spotting performance

    A Hybrid CRF/HMM Approach for Handwriting Recognition

    No full text
    International audienceIn this article, we propose an original hybrid CRF-HMM system for handwriting recognition. The main idea is to benefit from both the CRF discriminative ability and the HMM modeling ability. The CRF stage is devoted to the discrimination of low level frame representations, while the HMM performs a lexicon-driven word recognition. Low level frame representations are defined using n-gram codebooks and HOG descriptors. The system is trained and tested on the public handwritten word database RIMES

    BLSTM-CTC Combination Strategies for Off-line Handwriting Recognition

    No full text
    International audienceIn this paper we present several combination strategies using multiple BLSTM-CTC systems. Given several feature sets our aim is to determine which strategies are the most relevant to improve on an isolated word recognition task (the WR2 task of the ICDAR 2009 competition), using a BLSTM-CTC architecture. We explore different combination levels: early integration (feature combination), mid level combination and late fusion (output combinations). Our results show that several combinations outperform single feature BLSTM-CTCs

    Spotting Handwritten Words and REGEX using a two stage BLSTM-HMM architecture

    No full text
    International audienceIn this article, we propose a hybrid model for spotting words and regular expressions (REGEX) in handwritten documents. The model is made of the state-of-the-art BLSTM (Bidirectional Long Short Time Memory) neural network for recognizing and segmenting characters, coupled with a HMM to build line models able to spot the desired sequences. Experiments on the Rimes database show very promising results

    Using BLSTM for Spotting Regular Expressions in Handwritten Documents

    No full text
    International audienc

    Exploring multiple feature combination strategies with a recurrent neural network architecture for off-line handwriting recognition

    No full text
    International audienceThe BLSTM-CTC is a novel recurrent neural network architecture that has outperformed previous state of the art algorithms in tasks such as speech recognition or handwriting recognition. It has the ability to process long term dependencies in temporal signals in order to label unsegmented data. This paper describes different ways of combining features using a BLSTM-CTC architecture. Not only do we explore the low level combination (feature space combination) but we also explore high level combination (decoding combination) and mid-level (internal system representation combination). The results are compared on the RIMES word database. Our results show that the low level combination works best, thanks to the powerful data modeling of the LSTM neurons

    Language identification from handwritten documents

    No full text
    International audienceThis paper presents a novel approach for language identification in handwritten documents. The approach is based on script identification followed by character recognition. BLSTM-CTC based handwriting recognizers are used and the OCR output is fed to a statistical language identifier for detecting the language of the input handwritten document. Documents in two scripts (Latin and Bengali) and four languages (English, French, Bengali and Assamese) are considered for evaluation. Several alternative frameworks have been explored, effects of handwriting recognition and text length on language detection have been studied. It is observed that with some empirical restrictions it is very much possible to achieve more that 80% language detection accuracy and based on the current research practical systems can be designed

    Unconstrained Bengali handwriting recognition with recurrent models

    No full text
    International audienceThis paper presents a pioneering attempt for developing a recurrent neural net based connectionist system for unconstrained Bengali offline handwriting recognition. The major challenge in configuring such a classification system for a complex script like Bengali is to effectively define the character classes. A novel way of defining character classes is introduced making the recognition problem suitable for using a recurrent model. Indeed, it has to deal with more than nine hundred character classes for which the occurrence probability is very skewed in the language. An off-the-shelf BLSTM-CTC recognizer is used. An open-source dataset is developed for unconstrained Bengali offline handwriting recognition. The dataset contains 2,338 handwritten text lines consisting of about 21,000 word. Experiment shows that with the new definition of character classes the BLSTM-CTC provides an impressive performance for unconstrained Bengali offline handwriting recognition. The character level recognition accuracy is 75.40% without doing any post-processing on the BLSTM-CTC output. Among the 24.60% character level errors, the substitution, deletion and insertion errors are 18.91%, 4.69% and 0.98%, respectively

    Gabor Features for Real-Time Road Environment Classification

    No full text
    In this paper we present a methodology to use Gabor response features for real-time visual road environment classification. Processing Gabor filters using hardware solely dedicated to this task enables improved real-time texture classification. Using such hardware enables us to successfully extract Gabor feature information for a four-class road environment classification problem. We used summary histogram as an intermediate level of texture representation prior to final classification. Overall we obtain a maximally correct classification circa 98%, outperforming prior work in the field. Index Terms — random forests, Gabor filters, histograms, road environment, scene classification, real-tim
    corecore